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For  functions  tabulated  at  Chebyshev  nodes  on  an  interval,  spectral  interpolation,  integration  and 
differentiation  can  be  performed  stably  and  efficiently  via  the  fast  Fourier  transform.  In  this  paper, 
a  group  of  algorithms  is  presented  for  the  efficient  evaluation  of  Lagrange  polynomial  interpolants 
at  multiple  points  on  the  line,  and  for  the  rapid  spectral  integration  and  differentiation  of  functions 
tabulated  at  nodes  other  than  Chebyshev.  The  interpolation  scheme  requires  0{N  •  log(l/e)) 
arithmetic  operations,  and  0(N  -  log  N  +  N  •  log(l/£))  operations  are  required  for  the  integration 
and  differentiation  schemes,  where  e  is  the  precision  of  computations  and  N  is  the  number  of 
nodes.  The  algorithms  utilize  efficient  versions  of  the  fast  multipole  method  which  have  been 
designed  specifically  for  one-dimensional  problems;  these  are  also  described  in  the  present  paper. 
Several  experiments  are  included  to  illustrate  the  numerical  performance  of  the  approach. 
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1  Introduction 


Polynomials  form  the  theoretical  basis  for  many  areas  of  applied  mathematics.  While  the 
mathematical  properties  of  polynomials  have  been  quite  well  understood  for  over  a  century, 
attempts  to  use  them  in  practical  calculations  have  met  with  difficulties.  Typical  problems 
accompanying  the  employment  of  classical  schemes  are  those  of  prohibitive  computational  cost 
and  numerical  instability.  However,  certain  classes  of  orthogonal  polynomials  do  have  stable, 
fast  transforms  associated  with  them,  the  most  popular  being  Chebyshev  polynomials  which 
can  be  manipulated  in  a  stable  and  efficient  manner  via  the  fast  cosine  transform  or  FFT. 

In  this  paper  we  present  a  group  of  three  algorithms  for  the  interpolation,  integration 
and  differentiation  of  functions  tabulated  at  nodes  other  than  Chebyshev.  The  first  of  these 
algorithms  takes  as  input  a  set  of  points,  {xi,. .  .,xjv},  and  a  set  of  function  values,  {/i, . . 
and  evaluates  the  unique  interpolating  polynomial  of  degree  IV  —  1  at  the  points  {yi , . . . ,  Vn}  for 
a  computational  cost  proportional  to  N.  The  other  two  algorithms  perform  spectral  integration 
and  differentiation  of  this  interpolant.  We  will  also  describe  three  efficient  versions  of  the  Fast 
Multipole  Method  (FMM)  for  one  dimensional  problems;  these  algorithms  are  used  by  the 
polynomial  interpolation  algorithm  of  this  paper.  Throughout  this  paper  we  will  be  using 
the  well  known  Lagrange  representation  of  interpolating  polynomials  which  is  defined  by  the 
formula 

N  JV 

w=E/>-n^‘ 

j=i  > 

A  simple  algebraic  manipulation  converts  (1)  to  the  form 


Xk 


(1) 


N  N  f  N  , 

p"(*) = n  (*-**)•  e  ■  n  ^r^T’ 

*:=i 


j=lx~xi  k=l  X3  ~  Xk 
k£j 


(2) 


which  can  be  evaluated  at  N  points  in 


O  (./V  -  log  (“)) 


(3) 


arithmetic  operations  using  the  Fast  Multipole  Method  (FMM)  of  [10]  (£  here  is  the  desired 
accuracy).  In  comparison,  a  direct  evaluation  of  (2)  requires  0(N 2)  operations. 

Remark  1.1  A  somewhat  different  classical  polynomial  interpolation  problem  consists  of  de¬ 
termining  a  set  of  parameters  {aj, . . .,  ajv}  such  that,  for  j  =  1, . . .,  N, 


N 


(4) 


Jfc=i 


where  {xi,...,x/v}  is  a  given  a  set  of  points  and  {/i,-  •  -,/n}  is  a  given  set  of  function  val¬ 
ues.  This  problem  is  highly  ill-posed  for  anything  other  than  very  small  values  of  N  and  this 
formulation  is  seldom  used  when  actual  calculations  are  being  performed. 
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Remark  1.2  The  Lagrange  interpolation  formula  has  traditionally  been  less  favored  for  prac¬ 
tical  calculations  than  other  classical  methods  (see,  for  example,  [14]).  However,  the  algorithms 
of  this  paper  are  numerically  stable  and  very  efficient,  as  demonstrated  by  our  numerical  ex¬ 
periments,  thus  affording  the  Lagrange  formula  a  substantial  advantage  over  other  techniques 
for  the  manipulation  of  polynomials. 

Following  is  a  plan  of  the  paper.  The  first  three  sections  are  devoted  to  efficient  versions  of 
the  FMM  which  can  be  used  to  evaluate  expressions  of  the  form  (2):  Section  2  contains  a  number 
of  results  from  analysis  which  are  used  in  the  design  of  these  algorithms,  in  Section  3  we  present 
the  FMM  algorithm  itself,  and  in  Section  4  we  present  an  adaptive  version  of  this  algorithm. 
A  Fast  Polynomial  Interpolation  algorithm  utilizing  the  results  of  the  previous  sections  is  then 
described  in  Section  5,  and  in  Section  6  we  describe  how  this  algorithm  is  used  to  construct 
fast  algorithms  for  spectral  integration  and  differentiation.  In  Section  7  we  present  the  results 
of  several  of  our  numerical  experiments,  and  finally,  in  Section  8  we  list  several  generalizations 
and  conclusions  of  the  results  of  this  paper. 


2  Mathematical  and  Numerical  Preliminaries 


The  algorithms  of  this  paper  are  based  on  several  results  from  the  Chebyshev  approximation 
theory  of  the  function  1/x.  This  analysis  is  presented  in  the  Lemmas  and  Theorems  of  this 
section,  numbered  2.1-2.10.  The  main  results  of  this  section  fall  into  two  categories:  Theo¬ 
rems  2.5  and  2.7  describe  how  the  function  1/x  can  be  approximated  on  different  regions  of 
the  real  line  using  Chebyshev  expansions,  and  Theorems  2.8,  2.9  and  2.10  provide  three  ways 
of  manipulating  these  expansions  which  are  needed  by  the  fast  algorithms  of  this  paper. 

We  begin  with  three  classical  definitions  which  can  be  found,  for  example,  in  [8],  [14]. 

Definition  2.1  The  n-th  degree  Chebyshev  polynomial,  Tn(x),  is  defined  by  the  following  for¬ 
mulae: 

Tn(x)  =  cos(narccosz),  (5) 

Tn(x)  =  \  •  ((*  +  +  (*  -  V^l)n)  ■  (6) 

Definition  2.2  The  Chebyshev  nodes  {ti,.  of  order  n  on  the  interval  [—1, 1]  are  defined 
by  the  formulae 

(2k  —1  v\  ... 

“  -  - cos  '  V  (7) 

for  k  =  1,. . .,n. 


Definition  2.3  ttj, . . . ,  u„  will  be  a  set  of  polynomials  of  order  n  -  1  defined  by  the  formulae 


ufit)  =  fj 


*=i 


t-tk 
tj-tk ’ 


(8) 


for  j  =  1, . . n,  where  tk  are  defined  by  (7). 
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For  a  function  /  :  [-1, 1]  — ►  C,  the  order  n—  1  Chebyshev  approximation  to  /  on  the  interval 
[—1, 1]  is  the  unique  polynomial  of  order  n  —  1  which  agrees  with  /  at  the  nodes  ij, . .  ,,t„.  There 
exist  several  standard  representations  for  this  polynomial,  and  the  one  we  will  use  in  this  paper 
is  given  by  the  expression 

i:  /ft)  •«;(«)•  (») 

3- 1 

For  the  purposes  of  this  paper,  Chebyshev  expansions  for  any  function  will  be  characterized  by 
values  of  this  function  tabulated  at  Chebyshev  nodes. 

Lemmas  2.1-2.3  provide  estimates  involving  Chebyshev  expansions  which  are  used  in  the 
remainder  of  this  section.  The  proof  of  Lemma  2.1  is  obvious  from  (5). 

Lemma  2.1  Let  T„(i)  be  the  Chebyshev  polynomial  of  degree  n.  Then, 

|Tn(x)|  <  1  (10) 


for  any  x  €  [-1,1]. 

Lemma  2.2  Let  Tn(x)  be  the  Chebyshev  polynomial  of  degree  n.  Then, 


|T„(*)|  >  ^ 


5x 

T 


for  any  x  such  that  |x|  >  3. 

Proof.  From  Definition  2.1,  we  have 

|Tn(*)|  =  |  -  |(X  +  Vx2TT)n  +  (x-v4^1r| 

>  \  •  i* + V*2  -  (*/3>2r  =  \  •  i*  •  a + 


>  - 


5x  |n 
3 


(11) 


(12) 


for  any  x  such  that  ]z|  >  3. 


Lemma  2.3  Let  uj(x)  be  defined  by  (8).  Then,  for  any  x  €  [-1,1], 

M*)l  <  1.  (13) 

Proof.  It  is  obvious  from  (8)  that  Uj(tj)  =  1,  and  that  Uj(tk)  =  0  when  k  ^  j.  In  addition, 
the  expression 

]EW,)  Ji(.)  (14) 

k= l 


3 


is  also  equal  to  1  at  tj  and  equal  to  0  at  ail  other  Since  both  u:  and  (14)  are  polynomials 
of  order  n  —  1,  we  have 


(15) 


jt=i 


for  j  =  l,...,n.  Furthermore,  due  to  the  combination  of  (15)  and  the  triangle  inequality,  we 
obtain 


M*)l  = 


k=i 


(16) 

l7tife=i  I  7tjfe=i 

for  any  x  6  [—1, 1].  □ 

The  following  lemma  provides  an  identity  which  is  used  in  the  proof  of  Theorem  2.5. 

Lemma  2.4  Suppose  that  n  >  2,  and  that  b  >  0  and  Xo  are  real  numbers  with  |io|  >  3 b.  Then, 
for  all  x, 

i  /  \  V'  1  f x\  _  (x/b)  fiv\ 

(x  Xo)%b^rrQ'Uj\b)-Tn(xofby  (17) 

Proof.  Let  Q(x)  by  the  polynomial  of  degree  n  defined  by  the  formula 

Using  (8)  we  obtain 

Q(btk )  =  1  -  (btk  -x0)-^2  Tf—  —  •  tt j(tk) 

jZ \  bti  ~  x° 


(18) 


=  1  -  (btk  -  2o)  ■ 


btk  -  Xq 

for  k  ss  1,. . .,».  Clearly,  then,  Q(x)  satisfies  the  conditions 

Q(xo)  =  1 

Q(bt  i)  =  0 

Q(btn)  =  0. 


=  0, 


(19) 


(20) 


It  is  clear  that  the  function  Tn(x/b)/Tn(xo/b)  is  also  a  polynomial  of  degree  n  which  satisfies 
the  n  +  1  conditions  (20).  Therefore, 


0(l)  =  TJ^ibY 

and  (17)  follows  as  an  immediate  consequence  of  (18)  and  (21). 


(21) 

□ 
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Theorem  2.5  Suppose  that  n  >  2,  and  that  b  >  0  and  xo  are  real  numbers  with  |xo|  >  36. 
Then, 

1 _ 1  /*\ 

x-x0  “  btj  -  xq  i\b) 

for  any  x  €  [—6,  b]. 

Proof.  Due  to  Lemma  2.4,  we  have 


b  •  5n 


(22) 


1 

X  -  Xo 


Also,  due  to  Lemmas  2.1  and  2.2,  we  have 


1  ITn(x/b)\ 

|x  -  xo!  '  |Tn(xo/6)|' 


(23) 


for  any  x  €  [-6,6],  and 


|T*(x/6)|  <  1 
5xo 


|r.(*o/»)|  >  i 


36 


5" 


for  any  |xq|  >  36.  It  follows  from  the  combination  of  (23),  (24)  and  (25)  that 


<A.A<  1 


■  ir  ^ 


-  26  5"  ~  6  •  5" 


(24) 

(25) 

(26) 


for  any  x  €  [—6,6].  □ 

The  following  lemma  provides  an  identity  which  is  used  in  the  proof  of  Theorem  2.7. 

Lemma  2.6  Suppose  that  n  >  2,  and  that  6  >  0  and  xq  are  real  numbers  with  |xo|  <  6.  Then, 
for  all 


C  flh  C  1  V"1  tj  fc\  —  ^  ^n(£) 

£  (3  £xo) '  E  36  _  tjXQ  ■  -  l0  ■  r„(36/x0)  ■ 


(27) 


Proof.  Let  Q(f)  be  the  polynomial  of  degree  n  defined  by  the  formula 

Q(0  =  f  -  (36  -  e*o)  •  E  36_fif,Xo  •  (28) 

Using  (8)  we  obtain 

<?(«*)  =  ~  (36  -  tfcxp)  •  E  •  »i(«fc) 

=  4-  (36  -  t*x0)  •  36  _**  =  0,  (29) 
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for  fc  =  1, . . .,n.  Clearly,  then,  Q(£)  satisfies  the  conditions 


Q(3b/xo )  =  3b/ xq 
Q(h)  =  o 


Q(tn)  =  0. 


It  is  clear  that  the  function 

36  rn(  o 

lo  Tn(36/x0) 

is  also  a  polynomial  of  degree  n  which  satisfies  the  n  +  1  conditions  (30).  Therefore, 


Ml 

Tn(3b/xoy 


and  (27)  follows  as  an  immediate  consequence  of  (28)  and  (32). 


(30) 


(31) 


(32) 

□ 


Theorem  2.7  Suppose  that  n  >  2,  and  that  b  >  0  and  xq  are  real  numbers  with  |x0|  <  6. 
Then, 

i  i 

3 


_L _ A  tj  (3b\ 

-X0  ^  36  -  tjX 0  Uj  \  X  ) 


6-5" 


for  any  x  such  that  |x|  >  36. 

Proof.  Writing  £  =  3b/ x,  we  have  |£|  <  1  whenever  |x|  >  36,  and 

1  =  1  =  _J_ 

x  -  xq  3b/£-xo  36-fxo 


Due  to  Lemma  2.6,  we  have 


i  36  |r„(f)| 


J  36  -  £xo  ^  36  -  tjX o 
In  addition,  due  to  Lemmas  2.1  and  2.2  we  have 

36  •  |r„(OI  <  36 

for  £  €  [-1, 1],  and 


|36  -  £iol  |*o|  |T„(36/x0)l 


|*o  •  Tn(36/x0)|  > 


!*ol 


5-36 


3xq 


B  5n  •  6 
>  — — 


for  |x0|  <  6.  Substituting  (36)  and  (37)  into  (35),  we  obtain 


36  -  £xo  36  -  tjx o 


1  36-2  3 

<  rr 


26  5n  •  6  6  •  5n 


(33) 


(34) 

(35) 

(36) 

(37) 

(38) 
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for  f  G  [-1, 1].  Now  it  follows  from  the  combination  of  (34)  and  (38)  that 


1 


X  —  Xq 


n 


-E 


36  tjx0 


(39) 

□ 


The  following  three  theorems  provide  formulae  for  translating  along  the  real  axis  Chebyshev 
expansions  of  the  type  described  in  the  previous  two  theorems.  Theorem  2.8  provides  a  formula 
for  translating  expansions  described  in  Theorems  2.5,  Theorem  2.9  describes  a  mechanism  of 
converting  the  expansion  of  Theorem  2.5  to  the  expansion  of  Theorem  2.7,  and  Theorem  2.10 
provides  a  way  of  translating  the  expansion  of  Theorem  2.7.  These  theorems  are  one- dimensional 
counterparts  of  the  three  theorems  in  [10]  which  provide  translation  operators  for  multipole 
expansions  in  the  complex  plane. 

Theorem  2.8  Suppose  that  n  >  2  and  let  c,d  be  a  pair  of  real  numbers  such  that  [c  —  <f,  c  +  d]  C 
[— 1, 1].  Then ,  for  any  set  of  complex  numbers  ®i, . . . ,  and  any  x  €  [c-d,e  +  d\, 

■  Uj(x)  =  5Z  (**  '  ui(c  +  dtk))  ■  u:  (~r)  ■  (4°) 

J=1  fc=l  \  a  / 

Proof.  To  prove  this  theorem  we  first  show  that 

uj(x)  =  ui(c  +  dtk)  •  ( 41 ) 

fora:  €  [c  —  d,c  +  d].  Indeed,  the  right  hand  sideof(41)is  simply  the(n  —  l)-th  degree  Lagrange 
interpolating  polynomial  for  the  function  Uj(x)  at  the  nodes  c+dtx, . .  .,c  +  dt„.  However,  Uj(x) 
itself  is  a  polynomial  of  degree  n  —  1,  and  is  therefore  equal  to  its  Lagrange  interpolant  of  order 
n  -  1. 

The  formula  (40)  then  follows  as  an  immediate  consequence  of  (41).  □ 


Theorem  2.9  Suppose  that  n  >  2  and  let  c,d  be  a  pair  of  real  numbers  such  that  |c|  —  d  >  3. 
Let  the  function  f  :  R  — »  C  be  defined  by  the  formula 


N 

/<*)  =  E 


fc=i 


c*k 

X  -  Xk 


(42) 


where  x*  €  [—1,1]  fork  =  1  and  Qi,...,a^  is  a  set  of  complex  numbers.  Further,  let 

h , . . . ,  tn  be  Chebyshev  nodes  defined  by  (7),  let  , . . . ,  be  a  set  of  complex  numbers  defined 
by  the  formula 


*k  =  f 


(43) 
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for  k  =  1, . . n,  and  let  . . . ,  Vn  be  a  set  of  complex  numbers  defined  by  the  formula 

Then,  for  any  x  £  [c  -  d,  c  +  d], 

.  ■Ar  _  /i  —  c\|  3n  -f  1 

)|<ATyr' 

where  A  =  l<*k|- 

Proof.  Due  to  the  triangle  inequality, 

< S\  +  S2, 


(45) 


k=l 


where 


and 


S\  = 


52  = 


/(*)  “  S  f(c  +  dtk )  •  UJ  (^7^)  |  ’ 


(46) 

(47) 

(48) 


Combining  Theorem  2.5  with  the  triangle  inequality  we  obtain 

s,<r^’ 

and  from  the  combination  of  Theorem  2.7,  Lemma  2.3  and  the  triangle  inequality,  we  obtain 


(49) 


k=i 


3  An 
b- 5"’ 


(50) 


where  A  =  |at|.  Finally,  substituting  (49)  and  (50)  into  (46)  we  have 

t!  \  T  ( ®  _  l  4  3fl  4"  1 

/w-E*,.»,(— )|<a._ 


(51) 


for  any  x  €  [c  -  d,  c  +  d). 


Theorem  2.10  Suppose  that  n  >  2  and  let  c,d  be  a  pair  of  real  numbers  such  that  [c-d,  c+d]  D 
[—1,1].  Let  the  function  /  :  R  — »  C  6e  defined  by  the  formula 


N 


<*k 


fc=l  1  ** 


(52) 
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where  Xk  €  [  -1, 1]  for  k  —  1, . . . ,  N,  and  £*i, . . . ,  a \  is  a  set  of  complex  numbers.  Further,  let 
*!»•••»*«  b  Chebys.^ev  nodes  defined  by  (7),  let  $i, . . . ,  be  a  set  of  complex  numbers  defined 
by  the  jormula 

*-'G) 

for  k  =  1, . .  - ,  n,  and  let  .  ,$n  be  a  set  of  complex  numbers  defined  by  the  formula 

“  :)• 


j= i 

Then,  for  any  x  such  that  |x  -  c|  >  3d, 


3d  +  ctk , 


(53) 


(54) 


nx)-pt.Ui(JL)\<A.«!±. 

where  A  =  J^k=i 

Proof.  It  follows  from  the  triangle  inequality  that 

f(x )  ~  5Z  ‘  ui  { - )  I  <  Si  +  52 

t?i  \x-tJ\ 


1) 


(55) 


where 


and 


Si  = 


52 


£('K)-*-)-  (A)|- 


Combining  Theorem  2.7  with  the  triangle  inequality,  we  obtain 

3A' 


Si  < 


6  •  5n  ’ 


(56) 

(57) 

(58) 

(59) 


and  from  the  combination  of  Theorem  2.7,  Lemma  2.3  and  the  triangle  inequality,  we  obtain 


k= 1 


3  An 
6  •  5"  ’ 


(60) 


where  4  |o*|.  Finally,  substituting  (59)  and  (60)  into  (56)  we  have 


(61) 


for  any  x  such  that  |z  -  cj  >  3d. 
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3  The  Fast  Multipole  Method  in  One  Dimension 


In  this  section  we  consider  the  problem  of  computing  the  sums 


N 


(62) 


for  j  —  1, . .  .,N,  where  {zi, . .  .,xjv)  and  {y\, . .  .,Vn}  are  sets  of  real  numbers,  and  {aj,  .  .,a/v} 
and  {/i, . .  -  ,/at)  are  sets  of  complex  numbers.  This  problem  can  be  viewed  as  the  special  case 
of  the  JV-body  problem  in  physics  where  we  wish  to  evaluate  the  electrostatic  field  due  to  N 
charges  which  lie  on  a  straight  line  at  a  set  of  points  on  this  line. 


Remark  3.1  For  the  remainder  of  this  paper,  we  shall  assume  without  loss  of  generality  that 
Zi,Vi  €  [-1,1]  for  *  =  1,...,JV. 


Remark  3.2  The  fast  multipole  algorithm  of  [10]  computes  sums  of  a  more  general  form  than 
(62)  in  0(N )  arithmetic  operations.  This  more  general  form  is  described  by  the  formulae 


N 


4=1 


zk 


(63) 


for  j  =  1, . . . ,  N,  where  {zj, , .  .,2^}  and  {u>i, . . . ,  tujv}  are  sets  of  complex  numbers.  From  a 
physical  viewpoint,  this  corresponds  to  the  evaluation  of  the  electrostatic  field  due  to  N  charges 
which  lie  in  the  plane.  While  the  two  and  three  dimensional  scenarios  for  the  JV-body  problem 
have  been  discussed  in  some  depth  (see,  for  example,  [4],  [10]),  the  analysis  and  applications  of 
one  dimensional  problems  appear  to  have  been  largely  overlooked,  with  one  exception  of  which 
we  are  aware:  the  application  of  FMM  techniques  to  various  problems  in  numerical  linear 
algebra  (see  [11],  [12]). 

In  this  section  we  present  an  0{N)  algorithm  for  the  one- dimensional  problem  which  is  based 
on  the  two-dimensional  FMM,  but  incorporates  a  number  of  modifications  which  accelerate  the 
scheme  significantly.  We  summarize  these  modifications  below,  with  the  assumption  that  the 
reader  is  familiar  with  [10]. 

1.  Replacement  of  all  complex  by  complex  multiplications  with  real  by  complex  multiplica¬ 
tions. 

2.  Replacement  of  multipole  expansions  with  Chebyshev  expansions.  Chebyshev  series  are 
known  to  converge  more  rapidly  than  multipole  expansions  (which  are  actually  Taylor 
series). 

3.  Further  compression  of  the  Chebyshev  expansions  by  a  suitable  change  of  basis  (see  Sec¬ 
tion  3.4).  Using  this  technique,  the  function  \/x  can  be  accurately  represented  by  about  a 
quarter  of  the  number  of  coefficients  which  were  required  by  the  original  two-dimensional 
FMM. 
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4.  In  one  dimension,  each  subinterval  has  3  other  subintervals  in  its  interaction  list,  whereas 
in  tT  o  dimensions  each  box  has  27  other  boxes  in  its  interaction  list. 

5.  In  one  dimension,  each  subinterval  has  2  nearest  neighbor  subintervals,  whereas  in  two 
dimensions  each  box  has  8  nearest  neighbor  boxes. 

This  section  is  divided  into  four  parts.  Sections  3.1-3.3  are  devoted  to  an  algorithm  for  the 
FMM  in  one  dimension  which  uses  Chebyshev  expansions  in  place  of  multipole  expansions.  In 
Section  3.4,  we  describe  a  more  efficient  algorithm  which  is  based  on  the  algorithm  of  Section  3.3 
but  uses  a  Singular  Value  Decomposition  to  further  compress  the  Chebyshev  expansions. 

Remark  3.3  The  algorithms  described  in  this  paper  are  all  designed  for  evaluating  sums  of 
the  form 

=  (64) 
k=l  * 

However,  these  algorithms  are  only  mildly  dependent  on  the  choice  of  kernel,  i.e.  they  can  be 
modified  to  evaluate  expressions  of  a  more  general  form,  given  by  the  formula 

N 

/(*)  =  *52  ak  •  <t>(x  -  xk),  (65) 

fc=i 

where  </>  is  singular  at  0  but  smooth  everywhere  else.  Examples  of  functions  with  this  property 
include  <t>(x)  =  log(x),  <f>(x)  =  1/x 2  and  4>(x)  =  l/tan(x). 

3.1  General  Strategy 

We  will  illustrate  by  means  of  a  simple  example  how  Chebyshev  expansions  can  be  used  to 
evaluate  expressions  of  the  form  (62)  more  efficiently.  We  will  also  give  an  informal  description 
of  how  the  method  of  this  simple  example  is  used  in  the  construction  of  a  fast  algorithm  for 
the  general  case. 

First  we  introduce  a  definition  which  formalizes  the  notion  of  well-separated  intervals  on 
the  real  line.  This  is  simply  the  one- dimensional  analog  of  the  definition  of  well-separatedness 
in  [10]. 

Definition  3.1  Let  {ii,...,z^}  and  {yi,...,y\f}  be  two  sets  of  points  in  R.  We  say  that  the 
sets  {x,}  and  {y,}  ore  well-separated  if  there  exist  points  xo,  yo  €  R  and  a  real  r  >  0  such  that 

|x<  -  x0|  <  r  V  i  =  1, .  ..,N, 

|y,  -  yol  <  r  V  i  =  1,...,M,  and  (66) 

1*0  -  2/o I  >  4r. 

Suppose  now  that  {xi, . .  .,x\}  and  {yi, . . yjif}  are  well-separated  sets  of  points  in  R  (see 
Figure  1),  that  {oi,..., a^r}  is  a  set  of  complex  numbers,  and  that  we  wish  to  compute  the 
numbers  /(yi), . . .,  f{y\i)  where  the  function  /  :  R  -*  C  is  defined  by  the  formula 


/(*>  =  £  7? 
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Vo 
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Figure  1:  Well-separated  intervals  on  the  line. 


A  direct  evaluation  of  (67)  at  the  points  {j/i,...,Pa*}  requires  O(NM)  arithmetic  opera¬ 
tions.  We  will  describe  two  different  ways  of  speeding  up  this  calculation. 

Following  is  the  first  of  these  approaches. 

Let  the  function  fi  :  R  — ►  C  be  defined  by  the  formula 


N 


A(x)=£"‘-£^(^ 


*o) 


(68) 


where  p  is  an  integer  and  t\ , . . . , tp  and  ui,...,up  are  given  by  Definitions  2.2  and  2.3. 


Observation  3.4  From  the  combination  of  (67),  (68),  Theorem  2.7  and  the  triangle  inequality, 
we  see  that 

|rt«)-/l(«)|<*'^|ntl  (69) 

for  any  x  such  that  \x  —  i0|  >  3r. 

Now  let  {$!, . . . ,  3>p}  be  a  set  of  complex  numbers  defined  by  the  formulae 


N 

fe=i 


3r  -  tj(xk  -  *o) 


(70) 


for  j  =  1, . .  .,p.  Then, 

/.w  =  E*)«i(^)-  <71> 

The  vector  $  will  be  referred  to  as  the  far-held  expansion  for  the  interval  [io  -  r,  i0  +  r]. 

Computation  of  the  coefficients  requires  0{Np)  operations,  and  a  subsequent  evaluation 
of  /i(yi),...,/i(3/Af)  is  an  O(Mp)  procedure.  The  total  computational  cost  of  approximating 
(67)  to  a  relative  precision  1/5P  is  then  0(Np  -f-  Mp)  operations. 

An  alternative  way  of  speeding  up  this  calculation  is  described  below. 

Let  the  function  /2  :  R  — »  C  be  defined  by  the  formula 


AW-t  •>-tjr=fe= 

k=l  j= 1 


yo) 


(72) 


where  p  is  an  integer  and  ti,...,tp  and  uj , . . . , are  given  by  Definitions  2.2  and  2.3. 
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level 


3  I - 1 - 1 - 1 - f 


Figure  2:  Hierarchy  of  subintervals. 


Observation  3.5  From  the  combination  of  (67),  (72),  Theorem  2.5  and  the  triangle  inequality, 
we  see  that 

|/M-/2(*)|<-fej,atl  (73) 

for  any  x  such  that  \x  —  y0|  <  r- 

Now  let  {¥i, . . $p}  be  a  set  of  complex  numbers  defined  by  the  formulae 


N 

*,  =  X> 

k=l 


rtj~(xk~xo)' 


(74) 


for  j  =  1, . . .  ,p.  Then, 

=  (75) 

j=l  \  r  / 

The  vector  $  will  be  referred  to  as  the  local  expansion  for  the  interval  [yo  —  r,  yo  +  r]. 

Computation  of  the  coefficients  requires  O(Np)  operations,  and  a  subsequent  evaluation 
of  /2(j/i),  •  •  .,/2(yAf)  is  an  O(Mp)  procedure.  Again  the  total  computational  cost  of  approxi¬ 
mating  (67)  to  a  relative  precision  1/5*  is  0(Np+  Mp)  operations. 

Consider  now  the  general  case,  where  the  points  {xi , . . . ,  xn}  and  {yj , . . . ,  y\f}  are  arbitrar¬ 
ily  distributed  on  the  interval  [—1, 1]  (see  Remark  3.1).  We  use  a  hierarchy  of  grids  to  subdivide 
the  computational  domain  [—1,1]  into  progressively  smaller  subintervals,  and  to  subdivide  the 
sets  {xj}  and  {y;}  according  to  subinterval  (see  Figure  2).  A  tree  structure  is  imposed  on  this 
hierarchy,  so  that  the  two  subintervals  resulting  from  the  bisection  of  a  larger  (parent)  interval 
are  referred  to  as  its  children.  Two  Chebyshev  expansions  are  associated  with  each  subinterval: 
a  far-field  expansion  for  the  points  within  the  subinterval,  and  a  local  expansion  for  the  points 
which  are  well-separated  from  the  subinterval.  Interactions  between  pairs  of  well-separated 
subintervals  can  be  computed  via  these  Chebyshev  expansions  in  the  manner  described  above, 
and  all  other  interactions  at  the  finest  level  can  be  computed  directly.  Once  the  precision  has 
been  fixed,  the  computational  cost  of  the  entire  procedure  is  O(N)  operations  (for  a  detailed 
description  and  complexity  analysis,  see  Section  3.3). 
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3.2  Notation 

In  this  section  we  introduce  the  notation  to  be  used  in  the  description  of  the  algorithm  below. 

•  p  will  be  an  integer  denoting  the  size  of  Chebyshev  expansions  used  by  the  algorithm. 
Normally,  p  =  [— log^s)],  where  £  is  the  desired  precision  of  computations. 

•  will  denote  Chebyshev  nodes  of  order  p  on  the  interval  [—1,1],  defined  by  the 
formulae 


for  i  =  1, ...,p. 

•  . . . ,  up(t)  will  denote  the  set  of  polynomials  defined  by  the  formulae 

*A0  =  f[  rzjr-  (”) 

Jk=l Tj  tk 
k*i 

for  j  =  1  ,...,p. 

•  s  will  be  an  integer  denoting  the  number  of  points  in  each  subinterval  at  the  finest  level 
of  subdivision.  Normally,  s  «  2p  (see  Remark  3.6). 

•  nlevs  =  pog2(W/s)]  will  denote  the  level  of  finest  subdivision  of  the  interval  [—1, 1]. 

•  will  be  a  p-vector  denoting  the  fax-field  expansion  for  the  subinterval  i  at  level  l. 

•  will  be  a  p-vector  denoting  the  local  expansion  for  the  subinterval  i  at  level  1. 

•  Ml  and  Mr  will  be  p  x  p  matrices  for  obtaining  far-field  expansions  for  subintervals  in 
terms  of  the  far-field  expansions  of  their  children.  These  will  be  defined  by  the  formulae 

m-j)  =  ”<(2^).  <78> 

which  are  obtained  from  the  formula  (54)  of  Theorem  2.10  applied  to  the  cases 
c  =  —  l,d  =  2  and  c  =  l,d  =  2. 

•  Sl  and  Sr  will  be  p  x  p  matrices  for  obtaining  local  expansions  for  subintervals  in  terms 
of  the  local  expansions  of  their  parent.  These  will  be  defined  by  the  formulae 

Sd'J)  =  «/  (^j-l)  . 

(79) 

which  are  obtained  from  the  formula  (40)  of  Theorem  2.8  applied  to  the  cases 
c  =  -\,d  =  \  and  c  =  ^,d  = 
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•  Ti,  T2,  T3  and  T4  will  be  p  x  p  matrices  for  obtaining  local  expansions  from  far-field 
expansions.  These  will  be  defined  by  the  formulae 


Ui,j)  = 

Uj  ( 

^  CO  ' 
CO  I  < 

T2(i,j)  = 

Uj  ( 

T3(»\j)  = 

Uj  ( 

Ji+ 4)’ 

(80) 

T4(iJ)  = 

Uj  ( 

3  ^ 
ji  +  6 )  ' 

which  axe  obtained  from  the  formula  (44)  of  Theorem  2.9  applied  to  the  cases 
c  =  -6,  d  —  1,  c  —  -4,  d  =  1,  c  =  4,  d  =  1  and  c  =  6,  d  =  1. 


3.3  Description  of  the  Algorithm 

Following  is  a  formal  description  of  an  algorithm  for  the  efficient  evaluation  of  expressions  of 
the  form  (62). 


Algorithm  3.1 

Step  Complexity  Description 

1  0(1)  Comment  [Input  problem  size,  N,  and  a  real  number  e  >  0.] 

Set  size  of  Chebyshev  expansions  p  =  f—  log 5(e)] ,  choose  s  and  set  level  of  refine¬ 
ment  nlevs  =  flog2(lV/s)). 


2  O(Np)  Comment  [Determine  far-field  expansions  for  subintervals  at  finest  level.] 

do  i  =  1, .  ..,2nlevt 

Compute  p-term  far-field  expansion  $niev,i  due  to  the  subset  of  {xj} 
which  lie  in  subinterval  i  at  level  nlevs. 
enddo 


3 


0(2Np7/s ) 


Comment  [Determine  p-term  far-field  expansions  for  each  subinterval  at  every 
level  by  shifting  and  adding  far-field  expansions  of  the  subinterval’s  children.] 


do  l  =  nlevs  —  1, . . .,  1 
do  i  =  1, . . . ,  2* 

$l,«  =  Ml  ■  $/+i,2i-i  +  Mr  ■  $i+i,2i 

enddo 

enddo 


4  0(SNp2/s)  Comment  [Determine  p-term  local  expansions  for  each  subinterval  at  every  level 

by  first  shifting  local  expansion  of  the  subinterval’s  parent,  and  then  adding  the 
interactions  with  the  three  subintervals  which  are  well-separated  from  the  subin¬ 
terval,  but  which  have  not  been  accounted  for  at  the  parent’s  level.] 

do  l  =  l,...,nlevs  —  1 
do  i  =  1, . .  .,2* 

$»+1.2«'-l  =  Sl  •  $l,i  +  Ti  ■  $1+1,21-3  +  Tz  ■  $J+l,2i+l  +  ?4  •  $/+l,2i+2 

$;+l,2«  =  Sr  ■  $|,i  +  T)  ■  $I+l,2i-3  +  Tz  •  $l+l,2t-2  +  T4  ■  $|+l,2i+2 
enddo 
enddo 

5  O(Np)  Comment  [Evaluate  local  expansions  for  subintervals  at  finest  level.] 

do  »  =  l,...,2n'M" 

Evaluate  p-term  local  expansions  at  the  subset  of  {t/j  }  which  lie 

in  subinterval  1  at  level  nlevs. 
enddo 

6  0(3Ns)  Comment  [Add  nearest  neighbour  interactions  which  have  not  yet  been  ac¬ 

counted  for  via  expansions.] 

do  »  =  1, . .  .,2T4/et" 

For  each  yi  in  subinterval  i  at  level  nlevs,  compute  interactions  with  all 
Xj  in  subintervals  *  —  l,i,t  +  1,  and  add  to  well-separated  values, 
enddo 

Total  0(N  •  (2 p  +  10 p2/s  +  3s)) 


Remark  3.6  The  number  s  can  be  chosen  to  minimize  the  total  operation  count  of  the  algo¬ 
rithm,  which  yields  s  ss  2 p.  The  above  algorithm  then  requires  order 

13-JV-p  (81) 

arithmetic  operations. 

Remark  3.7  The  operation  count  for  Step  6  assumes  that  the  points  {xj, . . . ,  x^}  are  reason¬ 
ably  uniformly  distributed.  Highly  non-uniform  distributions  are  discussed  in  Section  4. 

3.4  A  More  Efficient  Algorithm 

Chebyshev  expansions  are  not  the  most  efficient  means  of  representing  interactions  between  well- 
separated  intervals.  All  the  matrix  operators  of  the  algorithm  are  numerically  rank-deficient, 
and  can  be  further  compressed  by  a  suitable  change  of  basis.  The  orthogonal  matrices  required 
for  this  basis  change  axe  obtained  via  singular  value  decompositions  (SVDs)  of  appropriate 
matrices. 

In  this  subsection  we  describe  a  more  efficient  version  of  the  one  dimensional  FMM  which  is 
based  upon  this  observation.  The  algorithm  is  very  similar  to  Algorithm  3.1  with  one  important 
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modification:  separate  changes  of  basis  are  needed  for  interactions  from  the  left  and  interactions 
from  the  right,  so  for  every  subinterval  we  maintain  two  sets  of  expansions,  leftward  expansions 
and  rightward  expansions. 

Remark  3.8  SVDs  can  also  be  used  for  the  efficient  representation  and  application  of  much 
broader  classes  of  smooth  linear  operators  (see  Section  8). 

We  will  require  some  additional  notation  for  the  algorithm  description  of  this  section.  The 
following  denotations  use  the  notation  of  Section  3.2. 

•  p  will  be  an  integer  denoting  the  numerical  rank  of  the  operators  to  be  compressed. 

•  t f  will  denote  the  j-th  Chebyshev  node  of  order  p  on  the  interval  [0, 1],  and  tf  will  denote 
the  y-th  Chebyshev  node  of  order  p  on  the  interval  [-1, 0]. 

•  Ul  and  Vi  will  denote  px  p  matrices,  each  of  whose  columns  forms  an  orthonormal  set, 
and  E  will  denote  a  px  p  diagonal  matrix  such  that 

WUlZVZ  -  Ml\\  <  e  •  \\Ml\\,  (82) 

where 

MUiJ) = i jihrr  (83) 

for  i,j  =  1, . .  .,p.  Ul  and  Vl  are  used  to  compress  leftward  expansions,  and  UlEVl  is 
effectively  the  numerical  SVD  of  M  l- 

•  Ur  and  Vr  will  denote  px  p  matrices,  each  of  whose  columns  forms  an  orthonormal  set, 
such  that 

-  Mr\)  <  e  •  |)Afft||,  (84) 

where 

JZT,’  <«> 

for  i,j  =  1, . .  .,p.  An  examination  of  the  matrices  Ml  and  Mr  reveals  that  Ur  and  Ul 
are  closely  related,  and  Vr  and  Vl  are  closely  related.  Ur  and  Vr  are  used  to  compress 
rightward  expansions,  and  UrUV^  is  effectively  the  numerical  SVD  of  Mr. 

•  and  will  be  p- vectors  denoting  respectively  the  left  and  right  fax-field  expansions 
for  subinterval  t  at  level  l.  These  will  be  defined  by  the  formulae: 

*u  =  ur  ■  *'.«•  (86) 

•  and  9f\  will  be  p- vectors  denoting  respectively  the  left  and  right  local  expansions  for 
subinterval  i  at  level  /.  These  will  be  defined  by  the  formulae: 

=  Vf  -*w, 

=  K  Hi..-  (87) 
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•  M£,  Mr,  and  will  be  p  x  p  matrices  for  obtaining  left  and  right  far-field  expan¬ 
sions  for  subintervals  in  terms  of  the  left  and  right  fax-field  expansions  of  their  children. 
These  will  be  defined  by  the  formulae 

Ml  =  Ul-ML-UL , 

Ml  =  UZ-MR-UL, 

Ml  =  UZ-Ml-Ur,  (88) 

Mr  =  UZ-Mr-Ur. 

•  Sr->  §l  Sr  will  be  p  x  p  matrices  for  obtaining  left  and  right  local  expansions  for 
subintervals  in  terms  of  the  left  and  right  local  expansions  of  their  parent.  These  will  be 
defined  by  the  formulae 

si  =  vl  ■  sL  •  vL, 

Si  =  Vl  ■  Sr  ■  VL, 

Si  =  vZ-Sl-Vr,  (89) 

Si  =  Vr  -  Sr-  Vr. 

•  Tl  and  Tl  will  be  p  x  p  matrices  for  obtaining  left  local  expansions  from  right  far-field 
expansions.  These  will  be  defined  by  the  formulae 

Tl  =  Vl  -Tz  •  UL, 

Tl  =  VZ-Ta-Ul.  (90) 

•  Tl  and  T*  will  be  p  x  p  matrices  denoting  for  obtaining  right  local  expansions  from  left 
far-field  expansions.  These  will  be  defined  by  the  formulae 

Ti  =  v£  t2-  Ur, 

Ti  =  VZ-T^Ur.  (91) 

Following  is  a  step-by-step  description  of  a  modified  version  of  Algorithm  3.1. 

Algorithm  3.2 

Step  Complexity  Description 

1  0(1)  Comment  [Input  problem  size,  N,  and  a  real  number  e  >  0.] 

Set  size  of  Chebyshev  expansions  p  =  [-  log5(e)"| ,  choose  p,  choose  s  and  set  level 
of  refinement  nlevs  =  flog2(Ar/s)l. 

2  0(2Np)  do  i  =  1, . .  .,2nlevt 

Form  a  p-term  left  feu-field  expansion  $£Jev< 

Form  a  p-term  right  far-field  expansion  <&* evt  i. 
enddo 
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0(4Np2/s )  do  /  =  nlevs  —  1, . . . ,  1 

do  i  =  1, . . . ,  21 

*£<  =  ^•*^.2,-1  +  Mi-*L  l,2t 

=  M?  •  +  Mg  ■ 

enddo 

enddo 


O(10Np2/s)  do  /  =  1, . . . ,  nlevs  —  1 
do  i  =  1, . . . ,  2l 

,2»+l  +  T?-*kx  ,2i+2 
*/+i,2i  •*£  +  !?  -«f+1  ,2»+2 

'Vf+l,2i-l  =  $1,  ‘  'Vf'i  +  Tf  •  i>2,_3 

*H-1,2«  =  ‘  1,21-2  +  ^  •  $f+l,2i-3 

enddo 

enddo 


5  0(2Np) 


6  0(3As) 


do  i=  l,...,2n,e’" 

Evaluate  and  add  left  and  right  p-term  local  expansions  ^nievt.i  +  ^nJe «.#,»• 
enddo 

do  t  =  lJ...)2n,<," 

For  each  point  in  subinterval  t  at  level  nlevs,  compute 
interactions  with  all  other  points  in  subintervals  i  —  1,  t, » +  1, 
and  add  to  far-field  values, 
enddo 


Total  0(N  ■  (4 p  +  Up2 Is  +  3s)) 


Remark  3.9  We  can  choose  s  to  minimize  the  total  operation  count  of  the  algorithm,  which 
yields  s  «  2p.  The  above  algorithm  then  requires  order 

17  -N-p  (92) 


arithmetic  operations. 

Remark  3.10  The  results  of  our  numerical  experiments  indicate  that,  for  a  fixed  precision  e 
and  corresponding  p,  p  (the  numerical  rank  of  the  matrix  operators  to  be  compressed)  is  approx¬ 
imately  p/2.  This  condition  together  with  (81)  and  (92)  leads  us  to  expect  that  Algorithm  3.2 
will  require  about  two-thirds  of  the  number  of  arithmetic  operations  needed  by  Algorithm  3.1. 

Remark  3.11  The  operation  count  for  Step  5  assumes  that  the  points  {xi,...,xjv}  are  rea¬ 
sonably  uniformly  distributed.  An  adaptive  version  of  this  algorithm  is  described  in  Section  4, 
and  is  capable  of  handling  highly  non-uniform  distributions  while  preserving  computational 
efficiency  and  accuracy. 


19 


4  The  Adaptive  FMM  in  One  Dimension 

The  algorithms  of  the  previous  section  have  one  drawback:  their  operation  count  is  quite  sensi¬ 
tive  to  the  distribution  of  points,  and  they  become  inefficient  for  highly  non-uniform  distribu¬ 
tions.  We  now  describe  an  adaptive  version  of  Algorithm  3.2  which  overcomes  this  deficiency,  its 
complexity  being  O(N)  independently  of  the  spacing  of  the  nodes.  This  versatility  is  achieved 
by  using  different  levels  of  subdivision  for  different  parts  of  the  computational  domain.  An  in¬ 
teger  s  is  fixed,  and  at  each  level  of  refinement,  we  subdivide  only  those  intervals  which  contain 
more  than  s  points.  At  each  level,  then,  a  list  of  non-empty  subintervals  is  maintained  whose 
members  are  the  result  of  the  selective  subdivision  of  intervals  at  the  previous  level.  This  policy 
eliminates  the  inefficiency  of  the  non-adaptive  version,  where,  at  the  finest  level  of  subdivision, 
we  may  encounter  intervals  with  very  few  or  very  many  points.  In  the  non-adaptive  version, 
each  interval  has  two  nearest  neighbors  of  the  same  size,  whereas  in  the  adaptive  version,  in¬ 
tervals  are  permitted  to  have  neighbors  of  differing  size.  Figure  3  depicts  a  subdivision  of  the 
computational  domain  for  a  non-uniform  distribution  of  points. 

Remark  4.1  The  idea  of  selectively  subdividing  the  computational  domain  is  taken  from  the 
two  dimensional  adaptive  FMM  of  [4j.  This  algorithm  requires  a  somewhat  more  elaborate  data 
structure  than  its  non-adaptive  counterpart  to  account  for  interactions  between  all  the  different 
sized  boxes.  The  adaptive  algorithm  for  problems  in  one  dimension  also  needs  additional 
bookkeeping  to  keep  track  of  interactions  between  all  the  different  sized  subintervals.  However, 
the  simplified  geometry  of  the  real  line  suggests  the  use  of  a  somewhat  more  efficient  data 
structure  than  that  of  the  two  dimensional  case  (see  Figure  3). 

Remark  4.2  It  is  clear  that  for  a  fixed  machine  precision  £,  only  certain  distributions  of 
points  will  yield  meaningful  results.  For  example,  the  points  xj  and  x2  are  indistinguishable  if 
|x2  —  xi  |  <  |  •  |*i  +  x2| .  To  avoid  such  cases  we  will  impose  that  the  minimum  distance  between 
two  points  must  be  greater  than  e  •  (b  —  a)  where  [a,  b]  is  the  computational  domain.  Under 
this  condition,  the  highest  level  of  refinement  of  the  computational  domain  is  bounded  above 

by  |log2(£)|. 

4.1  Notation 

We  require  some  additional  notation  for  the  description  of  the  adaptive  algorithm  to  supplement 
that  of  Sections  3.2  and  3.4. 

•  nlevs  will  denote  the  level  of  finest  subdivision  of  any  part  of  the  interval  [—1, 1]. 

•  For  a  fixed  precision,  £,  m  =  |  log2(£)|  will  denote  the  maximum  level  of  refinement  of  the 
interval  [-1, 1]  (see  Remark  4.2). 

•  Jj  will  denote  the  set  of  non-empty  subintervals  at  level  /,  i.e.  the  set  of  subintervals  at 
level  l  resulting  from  the  bisection  of  a  larger  interval  at  level  /  —  1. 

•  If  subinterval  isub  contains  more  than  s  points,  it  is  called  a  parent  subinterval,  and 
ilchild(isub)  and  irchild(isub )  will  denote  its  left  child  and  its  right  child  which  axe  the 
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Figure  3:  Hierarchy  of  subintervals  for  non-uniform  distribution. 


21 


subintervals  resulting  from  its  bisection.  Otherwise,  isub  is  a  called  a  childless  subinterval 
and  ilchild(tsub )  and  irckild(isub)  are  set  to  0. 

•  ilnbr(isub)  and  irnbr(isub)  will  denote  the  left  and  right  neighbors  for  subinterval  isub, 
which  are  the  smallest  adjacent  subintervals  at  the  same  level  of  refinement  or  a  coarser 
one. 

•  and  will  be  p-vectors  denoting  respectively  the  left  and  right  far-field  expansions 
for  subinterval  i. 

•  1®f'  and  will  be  p-vectors  denoting  respectively  the  left  and  right  local  expansions  for 
subinterval  i. 

4.2  Description  of  the  Algorithm 

This  section  contains  a  detailed  description  and  a  complexity  analysis  of  an  adaptive  version  of 
Algorithm  3.2. 

Algorithm  4.1 

Initialization  Step 

Comment  [Geometrical  preprocessing] 

Comment  [Input  problem  size,  N,  and  a  real  number  £  >  0.] 

Set  size  of  Cbebyshev  expansions  p  =  f  —  log5(e)],  choose  p,  choose  s  and  set  maximum  level  of  refine¬ 
ment  m  =  (l°g2(W/s)l  ■ 


h  =  {[-1,0],  [0,1]} 

do  /  =  1, . . . ,  m  while  h  is  non-empty 
do  isub  6  h 

if  tsu6  contains  more  than  5  points  then 

Add  ilchild(isub)  and  irchild(isub)  to  Ii+\ . 
else 

ilchild(isub)  =  0 
irchild(isub)  =  0 
endif 
enddo 
enddo 
nlevs  =  l 

do  /  =  1, . . . ,  nlevs 
do  isub  e  h 

if  isub  is  not  childless  then 

ilnbr(irchild(isub))  =  ilchild(isub) 
irnbr(Uchtld(isub))  =  irchild(isub) 
if  ilnbr(isub)  is  not  childless  then 

ilnbr(ilchild(i$ub))  =  irchild(ilnbr(isub)) 
else 

ilnbr(ilchild(i8ub ))  =  ilnbr(isub) 
endif 

if  irnbr(isub)  is  not  childless  then 
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irnbr(irchild(isub))  =  ilchild(irnbr(isub)) 
else 

irnbr(irchild(isub))  =  irnbr(isub ) 
endif 
endif 
end  do 
enddo 


Step  1 

Comment  [Upward  Pass] 


do  /  =  nlevs , . . . ,  1 
do  tsu6  €  h 

if  isub  is  a  childless  subinterval  then 

Form  a  p-term  far-field  expansion  ut. 
Form  a  p-term  far-field  expansion 
else 


*L> 


endif 

enddo 

enddo 


—  ■  ^fiehildiitub)  +  Mr  ■  ♦ frchild(itub ) 

=  M £  ■  Qf/chiidfiiui)  +  Mr  •  child(i,ub ) 


Step  2 

Comment  [Downward  Pass] 


do  l  =  1, . nlevs 
do  isub  €  Ii 

if  isub  is  a  childless  subinterval  then 

Evaluate  and  add  p-  term  local  expansions 
else 

> yL  _  cR  .  ihL 

Vilchild(itvb)  JL  *i#ui 

xvt  _  cR  .  At 

“irchild(it  ub)  JR  *«»«* 

At)  _  c R  A  It 

*  ,lch,)d(fub)  — 


t  tub 


&R  '  _  cfi  . 

“  irchild(isub)  JR  ”•'«< 

if  ilnbr(isub)  is  a  childless  subinterval  then 

Add  contribution  of  to  each  point  in  ilnbr(isub) 

Add  contribution  of  each  point  in  ilnbr{isub)  to  'ffrchw<j(iJut) 
else 

^ilchild(iivb)  ^  ilchildfisub)  ^"l  *  ^>»ic4*/d(»in4r(»<uk)) 

'^irefc»/d(«ju4)  ^  irehild(itub)  ^ irehild(ilnbr(itub ))  ^2  ^ilckild(ilnbr(itub)) 

endif 

if  irnbr(isub)  is  a  childless  subinterval  then 

Add  contribution  of  to  each  point  in  irnbr(isub) 

Add  contribution  of  each  point  ip.  irnbr(isub)  to 
else 

^ile4tld(tiub)  ’  ^ilc4»Id(»ro4r(»#v4))  ^2  *^ireH»2d(irn4r(tsv4)) 

d/R  —  tiiR  4.  t*  ■  4* 

virehild(itub)  virehi/d(ttub)  '  1 1  virehitd(irnbr(iiub)) 

endif 
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endif 

enddo 

enddo 

Step  3 

Comment  [Direct  Interactions] 

do  isub  =  1, . . . ,  nsub 

if  isub  is  childless  then 

For  each  point  in  subinterval  isub,  compute  interactions  with 
all  other  points  in  this  subinterval  and  in  adjacent  childless 
subintervals,  and  add  to  far-field  values, 
endif 
enddo 

Following  is  a  complexity  analysis  of  Algorithm  4.1.  Before  presenting  a  step-by-step  break¬ 
down  of  the  operation  counts,  we  need  two  lemmas.  These  lemmas  provide  upper  bounds  on 
the  numbers  of  subintervals  which  can  be  created  by  the  process  of  selective  subdivision. 

Lemma  4.1  For  any  subdivision  of  the  computational  domain  produced  by  Algorithm  4-1,  the 
number  of  childless  intervals  is  bounded  by 


2‘log2  (!)  *T‘  (93) 

Proof.  Each  parent  interval  at  level  l  contains  more  than  s  points  (otherwise  it  would  not  be 
further  subdivided).  Therefore,  the  total  number  of  parent  intervals  at  level  l  is  bounded  above 
by  N/s.  Each  parent  interval  has  two  children  by  definition,  so  the  number  of  childless  boxes  at 
any  level  l  is  bounded  above  by  2 N/s.  The  bound  (93)  follows  from  the  fact  that  the  number 
of  levels  of  subdivision  is  bounded  above  by  log2(l/s)  (see  Remark  4.2).  □ 


Lemma  4.2  For  any  subdivision  of  the  computational  domain  produced  by  Algorithm  4.1,  the 
total  number  of  intervals  is  bounded  by 

3'Iog2G)'T-  (94) 

Proof.  The  number  of  parent  intervals  at  level  l  is  bounded  above  by  N/s.  Each  of  these  parent 
intervals  has  two  children,  so  the  number  of  childless  boxes  at  any  level  /  is  bounded  above  by 
2 N/s.  Thus,  the  total  number  of  intervals  at  all  levels  (childless  and  parent)  is  bounded  by 
log2(  1/0  -(N/s  +  2N/s).  □ 

Following  is  a  step-by-step  breakdown  of  the  operation  counts  of  Algorithm  4.1. 
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Step 

1 

Complexity 

0(2pJV)+ 

Description 

Each  point  contributes  to  the  left  and  right  p-term  far-field  expansions 
of  the  childless  subinterval  in  which  it  lies. 

0(4p2mN/s ) 

For  each  parent  subinterval  (there  are  at  most  mN/s)  we  perform  4 
p  x  p  matrix-vector  products. 

2 

0(2pAT)+ 

We  evaluate  and  add  the  left  and  right  p-term  local  expansions  for 
each  of  the  N  points. 

0(4p2mlV/s)+ 

For  each  parent  subinterval  (there  are  at  most  mN/s)  we  perform  4 
p  x  p  matrix- vector  products. 

0{6p2mN/s)+ 

For  each  parent  subinterval  (there  axe  at  most  mN/s )  we  perform  at 
most  6  p  x  p  matrix-vector  products. 

0{4pmN) 

For  each  parent  subinterval  (there  axe  at  most  mN/s )  we  perform  at 
most  4  p  x  s  matrix-vector  products. 

3 

0(3Ns) 

Each  of  the  yj  interacts  directly  with  those  x*  which  lie  in  its  own 
subinterval  and  the  two  nearest  neighbors.  There  are  at  most  3s  of 
these  points  x*. 

Total  0(\4p2mN  /  s  +  4  pmN  +  4  pN  +  3  Ns) 


Remark  4.3  We  choose  s  ~  2p,  as  in  Algorithm  3.2  (see  Remark  3.10).  The  above  algorithm 
then  requires  order 

N  •  (llpro  +  lOp)  (95) 


arithmetic  operations. 


5  A  Fast  Algorithm  for  Polynomial  Interpolation 

In  this  section  we  return  to  the  original  problem  of  this  paper:  given  a  set  of  points  {xj , . . . ,  xjv} 
and  function  values  {/i,  . . . ,  /n},  evaluate  the  unique  interpolating  polynomial  at  the  points 
{j/i pat}.  Recall  from  (2)  that  the  Lagrange  interpolating  polynomial  defined  by  the  formula 


N  N 
i= i  fc=  l 


X  -Xk 
Xj  -  Xk 


can  be  rewritten  in  the  form 


N  S 

Pn(x)  =  JJ(*  -*k)  Y 

k—  1  7  —  1 


f, 


N 

n 

k=l 


Xj  ~  Xk 


(96) 


(97) 
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Furthermore, 


Pn(Vi)  =  rj  *  lilll 

in  —  t. 


for  l  —  1, . . . ,  n,  where  rj  and  sj  are  defined  by  the  formulae 


and, 


r,  = 


ij(Kr  -  xk)  =  e^k=iHvi  *k)f 

k=l 


sj  =  TT  — l—  =  «*>-**) 

Mi 


(98) 


(99) 


(100) 


Observation  5.1  5ums  of  the  form  XT  —  *fc)  can  also  be  evaluated  using  the  algorithms 
of  this  paper  ( see  Remark  3.3).  The  numbers  {r;}  and  {sj}  can  therefore  be  computed  in 
0(JVlog(j))  operations  according  to  (99)  and  (100). 


Following  is  a  description  of  an  algorithm  for  the  efficient  evaluation  of  expressions  of  the 
form  (98). 


Algorithm  5.1 
Step  Complexity 
Init  0(N  Iog(l)) 

1  O(N) 


2  0{N  log(i)) 

3  O(N) 


Total  0(N  log(I)) 


Description 

Compute  the  numbers  {r;}  and  {s;}. 
do  j '  =  l,n 

9i=fj 
end  do 

Compute  p,  =  J2j=i  9j/{y i  -  *j)  using  Algorithm  4.1  (or  Algorithm  3-2). 
do  l  =  l,n 

pi  =  Pi-  n 
end  do 


Remark  5.2  It  is  well  known  that  the  polynomial  interpolant  of  a  function  is  spectrally  accu¬ 
rate  when  the  function  is  tabulated  at  Chebyshev  or  Legendre  nodes  (which  are  clustered  near 
the  interval  ends),  whereas  the  interpolation  errors  can  be  arbitrarily  large  when  the  function 
is  tabulated  at  general  distributions  of  points  (see,  for  example,  [5],  [14]).  It  is  expected  that 
many  practical  applications  of  Algorithm  5.1  will  assume  nonuniformly  spaced  nodes  which  are 
clustered  near  the  extremities  of  the  interval. 


6  Applications  in  Numerical  Integration  and  Differentiation 


The  fast  polynomial  interpolation  algorithm  of  this  paper  can  be  applied  to  a  variety  of  prob¬ 
lems.  One  such  example  is  discussed  in  this  section.  Here  we  will  consider  the  folowing  problem: 
given  a  set  of  points  {n, . . . ,  xN}  and  function  values  {A, . . . ,  /jv),  evaluate  the  integrals  and 
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derivatives  of  the  interpolating  polynomial  at  the  points  {s*}.  In  other  words,  we  wish  to 
compute 

J  t  Pn(x)<1x  and  /^(a:*)  (101) 

for  k  =  1, N,  where  Ps  is  the  interpolating  polynomial  for  the  function  values  {fk}  at  the 
points  {x*;},  defined  by  the  Lagrange  formula 


N  N 


pn(x) =£/*•  n  — ■ 

i=l  k= 1  X]  Xk 


(102) 


We  will  make  use  of  the  following  lemma,  which  may  be  found  in  the  appendix  to  [7].  This 
lemma  describes  formulae  for  the  integration  and  differentiation  of  Chebyshev  expansions. 


Lemma  6.1  Let  Pn  be  a  polynomial  given  by  a  Chebyshev  series 

N- 1 

Pn(x)  =  ^2  ak  ■ Tk(x ). 

k= o 

Then,  the  integral  of  P^j  has  a  series  expansion  of  the  form 


rx  N 

/  pN(t)dt  =  j2b>‘-  r*(*)> 

k=0 


where 


(103) 


bk  =  2k  (ak~1  ~  °*+1^  /or  2  <  A:  <  N, 
fei  =  -  ■  (2ao  -  02), 

fc>  =  -s-d-i  y-ij, 

and  the  derivative  of  Ppj  has  a  series  expansion  of  the  form 

£pN(x)  =  N^dk-Tk{x), 

k= 0 

where 


(105) 


(106) 


dk  =  ^2  for  l  <  k  <  N  -  2, 


j=k+ 1 
j+k  odd 


1  N 

do  =  9  '  J2  i  '  aP 
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Remark  6.1  It  can  be  shown  that  the  process  of  numerical  differentiation  via  Chebyshev  series 
has  a  condition  number  proportional  to  N2,  whereas  the  process  of  numerical  integration  via 
Chebyshev  series  has  a  condition  number  bounded  by  2.  Thus,  numerical  differentiation  of  this 
type  is  not  usually  favored  when  large  scale  calculations  are  being  performed.  On  the  other 
hand,  numerical  integration  is  virtually  insensitive  to  problem  size,  and  is  a  powerful  tool  in  the 
solution  of  certain  classes  of  differential  equations,  for  example  (for  a  more  detailed  discussion, 
see  [9]). 

The  two  algorithms  described  below  perform  the  integration  and  differentiation  of  the  La¬ 
grange  interpolating  polynomial  of  a  function  which  is  tabulated  at  nodes  other  than  Chebyshev. 
In  these  descriptions  we  wil  assume  that  Xf.  6  [—1, 1]  for  k  =  1, . . . , N,  and  that  are 

Chebyshev  nodes  of  order  N  on  the  interval  [—1, 1]. 

Algorithm  6.1 

Step  Complexity  Description 

1  0(7Vlog(l))  Interpolate  from  {xt}  to  {!*}  using  Algorithm  5.1. 

2  O(NlogN)  Compute  Chebyshev  coefficients  using  fast  cosine  transform. 

3  O(JV)  Integrate  Chebyshev  series  using  (105). 

4  0(N  log  TV)  Evaluate  new  series  at  Chebyshev  nodes  using  fast  cosine  transform. 

5  0(AT log(j))  Interpolate  from  {<*}  to  {xi}  using  Algorithm  5.1. 

Total  0(N  -  log  N  +  N  ■  log( j)) 

Algorithm  6.2 

Step  Complexity  Description 

1  0(N  log(i))  Interpolate  from  {**}  to  {!*}  using  Algorithm  5.1. 

2  0{N  log  N)  Compute  Chebyshev  coefficients  using  fast  cosine  transform. 

3  O(N)  Differentiate  Chebyshev  series  using  (107). 

4  0(N  log  N)  Evaluate  new  series  at  Chebyshev  nodes  using  fast  cosine  transform. 

5  0(lVlog(i))  Interpolate  from  {<*}  to  {**}  using  Algorithm  5.1. 

Total  0(NlogN  +  Nlog($)) 


7  Implementation  and  Numerical  Results 

We  have  written  FORTRAN  implementations  of  the  algorithms  of  this  paper  using  double 
precision  arithmetic,  and  have  applied  these  programs  to  a  variety  of  situations. 

Two  technical  details  of  our  implementations  appear  to  be  worth  mentioning  here: 

1.  Each  implementation  consists  of  two  main  subroutines:  the  first  is  an  initialization  stage 
in  which  the  elements  of  the  various  matrices  employed  by  the  algorithms  are  precom¬ 
puted  and  stored,  and  the  second  is  an  evaluation  stage  in  which  these  matrices  are 
applied.  Successive  application  of  the  linear  transformations  to  multiple  vectors  requires 
the  initialization  to  be  performed  only  once. 

2.  The  parameters  for  each  algorithm  were  chosen  to  retain  maximum  precision  while  min¬ 
imizing  the  CPU  time  requirements.  The  values  we  used  for  the  numerical  examples  of 
this  section  were  p  =  16,  p  =  9,  3  =  16  and  nlevs  =  log 2(n/s). 
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Our  implementations  of  the  algorithms  of  this  paper  have  been  tested  on  the  the  Sun 
SPARCstation  1  for  a  variety  of  input  data.  Four  experiments  are  described  in  this  section, 
and  their  results  are  summarized  in  Tables  1-9.  These  tables  contain  error  estimates  and  CPU 
time  requirements  for  the  algorithms,  with  all  computations  performed  in  double  precision 
arithmetic. 

The  table  entries  are  described  below. 


•  The  first  column  in  each  table  contains  the  problem  size  N,  which  was  chosen  to  be  a 
power  of  2  ranging  from  64  to  4096  for  each  example. 

•  The  second  and  third  columns  in  each  table  contain  the  relative  co-norm  error  E<x>  and 
the  relative  2-norm  error  Ei  for  each  result. 

•  The  fourth  and  fifth  columns  in  each  table  contain  CPU  timings  for  the  initialization  and 
evaluation  stages  of  the  algorithm. 

•  The  sixth  column  in  each  of  Tables  1-4  contains  CPU  timings  for  the  corresponding  direct 
calculation. 

•  The  last  column  in  each  of  Tables  1-4  contains  CPU  timings  for  an  FFT  of  the  same  size. 

Following  are  the  descriptions  of  the  experiments,  and  the  tables  of  numerical  results. 
Example  1. 

The  purpose  of  this  example  is  to  demonstrate  the  performance  of  the  one  dimensional  FMM 
algorithms  of  this  paper  when  applied  to  uniform  distributions  of  points  in  the  interval  [—1, 1]. 
We  considered  the  problem  of  evaluating  the  expressions 


N 


fc=l  V  Xk 


(108) 


for  j  =  1, . . . ,  N,  where  the  numbers  {a:*}  and  {y*}  were  chosen  according  to  the  formulae 

2k  —  1 


Xk  =  -1  + 


N  ’ 


Vk 


,  ,  2  •  (fc  +  0.1  ■  6k)  —  1 


N 


(109) 

(110) 


for  fc  =  1,. . .,  N,  and  {£*}  were  random  numbers  in  [-1, 1].  In  addition,  {ajt}  were  randomly 
chosen  from  the  interval  [0,1].  This  calculation  was  performed  in  three  ways: 

1.  via  Algorithm  3.1, 

2.  via  Algorithm  3.2,  and, 

3.  via  a  direct  implementation  of  the  formula  (108). 
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Results  of  this  experiment  are  presented  in  Tables  1  and  2. 

Example  2. 

Here  we  again  considered  the  problem  of  evaluating  (108),  as  in  Example  1,  but  we  replaced 
the  equispaced  nodes  with  non-uniform  distributions,  and  applied  both  the  adaptive  and  non- 
adaptive  versions  of  the  one  dimensional  FMM  algorithms  to  this  problem.  The  numbers  {x*} 
were  chosen  to  be  Legendre  nodes  of  order  N  (i.e.  the  roots  of  the  jV-th  order  Legendre 
polynomial)  on  the  interval  (—1,1],  and  {y*}  were  chosen  to  be  Chebyshev  nodes  of  order  N 
on  the  same  interval.  Once  again,  {ajt}  were  randomly  chosen  from  the  interval  [0, 1].  This 
calculation  was  performed  in  three  ways: 

1.  via  Algorithm  3.2, 


2.  via  Algorithm  4.1,  and, 

3.  via  a  direct  implementation  of  the  formula  (108). 

Results  of  this  experiment  are  presented  in  Tables  3  and  4. 

Example  3. 

Here  we  considered  the  problem  of  evaluating  the  Lagrange  interpolant  of  the  function 

f(x)  =  e"4*2  (111) 


tabulated  at  Legendre  nodes  {xi, . .  .,x/v)  on  the  interval  [—1, 1].  The  interpolant,  defined  by 
the  formula 

j=l  far  1  Xk 

k¥=j 

was  evaluated  at  Chebyshev  nodes  {yi,...,y/v}  on  the  same  interval.  This  calculation  was 
performed  in  three  ways: 


1.  via  the  formula 

PN(yi)  =  e£?=1  M*-**)  ■  Y  J&l  ■  e' 

yi  -  Xj 


using  Algorithm  5.1, 


(113) 


2.  via  a  direct  implementation  of  the  formula 


N  N  r/  \  N  •. 

Pn(vi)  =  il(y/  -  x*)  •  £  ~  ■  n  — 

Jt=l  j=ly'  x->  k=\XJ 

k^j 


and, 


(114) 


3.  via  a  direct  implementation  of  the  formula 

PN(yi)  =  eEL  .  y  l&L  .  e~  LLm 

■  J  III  —  T  ■ 


(115) 
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Results  of  this  experiment  are  presented  in  Tables  5-7 . 

Remark  7.1  Table  6  only  contains  results  for  N  <  1024  because  for  larger  problem  sizes  the 
computation  of  the  products  in  the  expression  (114)  generates  overflow  and  underflow  errors 
for  this  particular  computer  software  and  hardware. 


Example  4. 

In  this  example  we  tested  the  integration  and  differentiation  algorithms  of  Section  6  on  the 
functions 


f{x)  =  4x  •  (x2  -  1), 


(116) 


ff(x)  =  J*if(t)dt  =  (x2-1)2, 


(117) 


which  were  tabulated  at  Legendre  nodes  of  order  N  on  the  interval  [—1, 1]-  Algorithm  6.1  was 
used  to  compute  the  integrals  of  /,  and  Algorithm  6.2  was  used  to  compute  the  derivatives  of 
g  at  these  nodes.  Results  of  this  experiment  are  presented  in  Tables  8  and  9. 


The  following  observations  can  be  made  from  Tables  1-9,  and  are  in  agreement  with  results 
of  our  more  extensive  experiments  for  the  particular  computer  architecture,  implementations, 
and  values  of  N  under  consideration. 

1.  The  algorithms  permit  high  accuracy  to  be  attained,  and  the  observed  errors  are  in 
accordance  with  the  theoretically  obtained  error  bounds. 

2.  The  CPU  timings  for  each  algorithm  grow  linearly,  as  expected,  with  the  problem  size  N . 

3.  Algorithms  3.2  and  4.1  are  about  10  times  as  costly  as  an  FFT  of  the  same  size. 

4.  The  algorithms  break  even  with  the  corresponding  direct  methods  at  around  N  =  32  if 
the  initialization  time  is  ignored,  and  at  about  N  =  512  if  it  is  included. 

5.  The  evaluation  stage  for  Algorithm  3.1  is  approximately  50  percent  slower  than  the  evalua¬ 
tion  stage  for  Algorithm  3.2,  as  expected  (see  Remark  3.10).  However,  if  the  initialization 
times  are  included,  Algorithm  3.1  is  more  than  twice  as  fast  as  Algorithm  3.2.  Thus, 
Algorithm  3.1  should  be  used  whenever  the  linear  transformation  of  this  example  is  to  be 
applied  to  a  single  vector,  and  Algorithm  3.2  should  be  used  whenever  the  same  linear 
transformation  is  to  be  applied  to  many  different  vectors. 

6.  Tables  3  and  4  indicate  that  the  evaluation  stage  of  the  adaptive  algorithm  is  only  about 
30  percent  faster  than  the  non-adaptive  one  for  Legendre  nodes.  These  nodes  were  chosen 
because  functions  tabulated  there  are  very  well  approximated  by  their  polynomial  inter- 
polants  (see  Remark  5.2).  For  highly  non-uniform  distributions,  however,  the  adaptive 
algorithm  will  be  significantly  faster. 

7.  It  is  apparent  from  Tables  6  and  7  that  the  first  of  the  direct  methods  used  in  Example  3 
is  the  more  accurate  and  more  efficient  of  the  two.  However,  we  could  not  use  this  method 
for  values  of  N  larger  than  1024  due  to  numerical  instability  (see  Remark  7.1). 
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Table  1:  Example  1,  Numerical  Results  for  Algorithm  3.1. 


N 

Errors 

Timings  (sec.) 

Eoo 

e2 

Init. 

Eval. 

Direct 

eft 

64 

0.204  E-14 

0.298  E-14 

0.22 

■ 

0.351  E-15 

0.560  E-15 

0.32 

BBS 

' 

0.998  E-15 

0.332  E-14 

0.49 

giliwH 

ppT 

512 

0.517  E-15 

0.553  E-14 

0.94 

0.147 

1.34 

1  v" 

1024 

0.944  E-15 

0.679  E-14 

1.71 

5.37 

ilPi 

2048 

0.190  E-14 

0.955  E-14 

3.40 

UTirfll 

21.82 

4096 

0.401  E-14 

0.328  E-13 

6.58 

1.283 

87.79 

0.132 

Table  2:  Example  1,  Numerical  Results  for  Algorithm  3.2. 
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Table  3:  Example  2,  Numerical  Results  for  Algorithm  3.2. 


Timings  (sec.) 


Eval.  Direct 


Errors 

Eoo 

e2 

0.746  E-15 
0.758  E-15 
0.146  E-14 
0.325  E-14 
0.128  E-13 
0.700  E-14 
0.775  E-14 

0.171  E-14 
0.216  E-14 
0.316  E-14 
0.897  E-14 
0.264  E-13 
0.211  E-13 
0.318  E-13 

Table  4:  Example  2,  Numerical  Results  for  Algorithm  4.1. 


I 


N 

Errors 

Timings  (sec.) 

Eoo 

Ei 

Init. 

Eval. 

64 

0.351  E-13 

0.330  E-13 

0.44 

128 

0.542  E-13 

0.453  E-13 

0.74 

m 

256 

0.628  E-13 

0.340  E-13 

1.30 

msmvm 

512 

0.877  E-13 

0.459  E-13 

2.49 

0.147 

1024 

0.136  E-12 

0.497  E-13 

4.97 

0.299 

2048 

0.160  E-12 

0.555  E-13 

10.30 

0.611 

4096 

0.204  E-12 

0.692  E-13 

20.99 

1.239 

Table  5:  Example  3,  Numerical  Results  for  Algorithm  5.1. 


N 

Eoo 

Ei 

Timings  (sec.) 

64 

0.134  E-14 

0.789  E-15 

0.04 

128 

0.222  E-14 

0.126  E-14 

0.14 

256 

0.688  E-14 

0.263  E-14 

0.58 

512 

0.161  E-13 

0.598  E-14 

2.33 

1024 

0.318  E-13 

0.105  E-13 

9.52 

2048 

- 

- 

- 

4096 

- 

- 

- 

Table  6:  Example  3,  Numerical  Results  for  Direct  Method  1. 


N 

Eoo 

Ei 

Timings  (sec.) 

64 

0.740  E-14 

0.433  E-14 

0.14 

128 

0.266  E-13 

0.141  E-13 

0.54 

256 

0.976  E-13 

0.319  E-13 

2.19 

512 

0.198  E-12 

0.951  E-13 

8.70 

1024 

0.619  E-12 

0.295  E-12 

34.92 

2048 

0.199  E-ll 

0.789  E-12 

141.84 

4096 

0.588  E-ll 

0.218  E-ll 

567.92 

Table  7:  Example  3,  Numerical  Results  for  Direct  Method  2. 
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N 

|  Errors 

|  Timings  (sec.) 

Eoo 

e2 

Init. 

Eval. 

64 

0.766  E-13 

0.384  E-13 

m 

0.031 

128 

0.135  E-12 

0.419  E-13 

1.4 

8 

0.077 

256 

0.309  E-12 

0.529  E-13 

2.6 

2 

0.155 

512 

0.383  E-12 

0.637  E-13 

9 

0.329 

1024 

0.268  E-12 

0.596  E-13 

10.01 

0.661 

2048 

0.566  E-12 

0.729  E-13 

20.36 

1.325 

4096 

0.683  E-12 

0.102  E-12 

40.50 

2.651 

Table  8:  Example  4,  Numerical  Results  for  Algorithm  6.1. 


N 

Errors 

Timings  (sec.) 

Eoo 

e2 

Init. 

Eval. 

64 

0.101  E-10 

0.225  E-ll 

0.94 

0.032 

128 

0.520  E-10 

0.890  E-ll 

1.53 

0.079 

256 

0.221  E-09 

0.320  E-10 

2.71 

0.164 

512 

0.348  E-08 

0.338  E-09 

4.98 

0.331 

1024 

0.272  E-07 

0.193  E-08 

10.04 

0.716 

2048 

0.190  E-06 

0.960  E-08 

20.85 

1.309 

4096 

0.801  E-06 

0.528  E-07 

40.90 

2.621 

Table  9:  Example  4,  Numerical  Results  for  Algorithm  6.2. 
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8.  Tables  8  and  9  show  rapidly  growing  e-rors  for  spectral  differentiation,  but  not  for  spec¬ 
tral  integration,  demonstrating  the  well  known  difference  in  stability  between  these  two 
processes  (see  Remark  6.1)  Timings  for  the  two  methods  are  similar,  as  expected. 

9.  The  initialization  stage  is  more  costly  than  the  evaluation  stage  for  all  of  the  algorithms. 
Implementing  the  algorithms  in  two  stages  gives  considerable  time  savings  whenever  the 
same  linear  transformation  is  to  be  applied  to  multiple  vectors. 

8  Conclusions  and  Generalizations 

In  this  paper,  *ve  have  described  a  collection  of  algorithms  which  includes: 

1.  efficient  versions  of  the  Fast  Multipole  Method  in  one  dimension, 

2.  a  fast  algorithm  for  polynomial  interpolation  on  the  line,  and 

3.  fast  algorithms  for  the  integration  and  differentiation  of  functions  tabulated  at  nodes 
other  than  Chebyshev. 

The  number  of  arithmetic  operations  required  by  each  algorithm  is  proportional  to  either  N  or 
N  ■  log  N ,  where  the  constant  of  proportionality  depends  on  the  desired  accuracy. 

Several  obvious  generalizations  of  the  results  of  this  paper  are  discussed  below. 

1.  The  algorithms  described  in  Sections  3  and  4  axe  all  designed  for  evaluating  sums  of  the 
form 

fc= 1  * 

However,  they  can  be  modified  to  evaluate  expressions  of  a  more  general  form,  given  by 
the  formula 

N 

f(x)  ='52ak-<Kx  ~  *k),  (119) 

k=l 

where  4>  is  singular  at  0  but  smooth  everywhere  else.  Examples  of  functions  with  this 
property  include  <f>( x)  =  log(x)  and  <fr(x)  —  1/x2,  which  are  readily  obtained  by  integrating 
and  differentiating  the  expression  (64).  Another  example  of  interest  is  4>(x)  =  l/tan(x) 
which  arises  in  the  formulation  of  the  trigonometric  interpolation  problem.  This  case  is 
currently  being  investigated,  and  results  will  be  reported  at  a  later  date. 

While  the  algorithmic  procedures  for  different  kernels  <j>  will  be  virtually  identical,  differ¬ 
ent  sets  of  formulae  will  be  needed  for  the  manipulation  of  the  Chebyshev  fax-field  and 
local  expansions  which  are  required  by  the  algorithms.  These  formulae  are  obtained  by 
constructing  analogs  of  Theorems  2.5,  2.7,  2.8,  2.9  and  2.10  for  the  particular  function  4>. 

2.  The  problems  considered  in  this  paper  all  involve  the  evaluation  of  an  JV-term  series  at  N 
points.  Straightforward  modifications  to  the  algorithms  of  this  paper  will  allow  the  effi¬ 
cient  evaluation  of  these  iV-term  series  at  M  points,  where  M  ^  N .  These  modifications 
have  been  implemented. 
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3.  The  use  of  the  Singular  Value  Decomposition  in  the  approximation  of  smooth  functions 
can  be  extended  to  many  families  of  linear  operators.  Specifically,  this  approach  may  be 
used  to  substantially  accelerate  the  Fast  Laplace  Transform  of  [13]  and  the  Fast  Legendre 
Transform  of  [2]. 
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